Combining Acoustic Data Driven G2P and Letter-to-Sound Rules for Under Resource Lexicon Generation

نویسندگان

  • Ramya Rasipuram
  • Mathew Magimai-Doss
چکیده

In a recent work, we proposed an acoustic data-driven grapheme-to-phoneme (G2P) conversion approach, where the probabilistic relationship between graphemes and phonemes learned through acoustic data is used along with the orthographic transcription of words to infer the phoneme sequence. In this paper, we extend our studies to under-resourced lexicon development problem. More precisely, given a small amount of transcribed speech data consisting of few words along with its pronunciation lexicon, the goal is to build a pronunciation lexicon for unseen words. In this framework, we compare our G2P approach with standard letter-to-sound (L2S) rule based conversion approach. We evaluated the generated lexicons on PhoneBook 600 words task in terms of pronunciation errors and ASR performance. The G2P approach yields a best ASR performance of 14.0% word error rate (WER), while L2S approach yields a best ASR performance of 13.7% WER. A combination of G2P approach and L2S approach yields a best ASR performance of 9.3% WER.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Acoustic Data-Driven Lexicon Learning Based on a Greedy Pronunciation Selection Framework

Speech recognition systems for irregularly-spelled languages like English normally require hand-written pronunciations. In this paper, we describe a system for automatically obtaining pronunciations of words for which pronunciations are not available, but for which transcribed data exists. Our method integrates information from the letter sequence and from the acoustic evidence. The novel aspec...

متن کامل

Speech recognition without a lexicon - bridging the gap between graphemic and phonetic systems

Modern speech recognizers rely on three core components: an acoustic model, a language model, and a pronunciation lexicon. In order to expand speech recognition capability to lowresource languages and domains, techniques to peel away the expert knowledge required to craft these three components have been growing in popularity. In this paper, we present a method for automatically learning a weig...

متن کامل

Acoustic data-driven grapheme-to-phoneme conversion in the probabilistic lexical modeling framework

One of the primary steps in building automatic speech recognition (ASR) as well as text-to-speech systems is development of a phonemic lexicon that provides a mapping between each word and its pronunciation as a sequence of phonemes. Phoneme lexicons can be developed by humans through use of linguistic knowledge, however, this would be a costly and time-consuming task. To facilitate this proces...

متن کامل

Combining linguistic knowledge and acoustic information in automatic pronunciation lexicon generation

This paper describes several experiments aimed at the long term goal of enabling a spoken conversational system to automatically improve its pronunciation lexicon over time through direct interactions with end users and from available Web sources. We selected a set of 200 rare words from the OGI corpus of spoken names, and performed several experiments combining spelling and pronunciation infor...

متن کامل

Lexicon Optimization for WFST-Based Speech Recognition Using Acoustic Distance Based Confusability Measure and G2P Conversion

In this paper, we propose a lexicon optimization method based on a confusability measure (CM) to develop a large vocabulary continuous speech recognition (LVCSR) system with unseen words. When a lexicon is built or expanded for unseen words by using grapheme-to-phoneme (G2P) conversion, the lexicon size increases since G2P is generally realized by 1-to-N-best mapping. Thus, the proposed method ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012